Biostatistics For Dummies (Monika Wahi John Pezzullo)

Testing whether r is statistically significantly different from zero

Before beginning your calculations for correlation coefficients, remember that the data used in a

correlation — the “ingredients” to a correlation — are the values of two variables referring to the

same experimental unit. An example would be measurements of height (X) and weight (Y) in a sample

of individuals. Because your raw data (the X and Y values) always have random fluctuations due to

either sampling error or measurement imprecision, a calculated correlation coefficient is also subject

to random fluctuations.

Even when X and Y are completely independent, your calculated r value is almost never exactly zero.

One way to test for a statistically significant association between X and Y is to test whether r is

statistically significantly different from zero by calculating a p value from the r value (see Chapter 3

for a refresher on p values).

The correlation coefficient has a strange sampling distribution, so it is not useful for statistical testing.

Instead, the quantity t can be calculated from the observed correlation coefficient r, based on N

observations, by the formula

. Because t fluctuates in accordance with the

Student t distribution with

degrees of freedom (df), it is useful for statistical testing (see Chapter

11 for more about t).

For example, if

for a sample of 12 participants, then

, which works out to

, with 10 degrees of freedom. You can use the

online calculator at https://statpages.info/pdfs.html and calculate the p by entering the t and

df values. You can also do this in R by using the code:

2 * pt(q = 1.8257, df = 10, lower.tail = FALSE).

Either way, you get

, which is greater than 0.05. At α = 0.05, the r value of 0.500 is not

statistically significantly different from zero (see Chapter 12 for more about α).

How precise is an r value?

You can calculate confidence limits around an observed r value using a somewhat roundabout process.

The quantity z, calculated by the Fisher z transformation

, is approximately

normally distributed with a standard deviation of

. Therefore, using the formulas for normal-

based confidence intervals (see Chapter 10), you can calculate the lower and upper 95 percent

confidence limits around

and

. You can turn

these into the corresponding confidence limits around r by the reverse of the z transformation:

for

and

Here are the steps for calculating 95 percent confidence limits around an observed r value of 0.05 for

a sample of 12 participants (N = 12):

1. Calculate the Fisher z transformation of the observed r value:

2. Calculate the lower and upper 95 percent confidence limits for z:

3. Calculate the lower and upper 95 percent confidence limits for r: